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Abstract 



We prove that the optimal assignment kernel, proposed recently as 
an attempt to embed labeled graphs and more generally tuples of basic 
data to a Hilbert space, is in fact not always positive definite. 

1 Introduction 

Let X he a set, and k : X x X ^ M a symmetric function that satisfies, for 
any n £ N and any (ai, . . . , a„) G M" and (xi, . . . , x„) G A"": 



Such a function is called a positive definite kernel on X. A famous result by 
[1] states the equivalence between the definition of a positive kernel and the 
embedding of ^ in a Hilbert space, in the sense that /c is a positive definite 
kernel on X if and only if there exists a Hilbert space 7i with inner product 
(•, •)^ and a mapping ^ : X ^ TC such that, for any x, x' £ X, it holds that: 



The construction of positive definite kernels on various sets X has re- 
cently received a lot of attention in statistics and machine learning, because 
they allow the use of a variety of algorithms for pattern recognition, regres- 
sion of outlier detection for sets of points in X [SI [3]. These algorithms. 
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collectively referred to as kernel methods, can be thought of as multivari- 
ate linear methods that can be performed on the Hilbert space implicitly 
defined by any positive definite kernel k through ([1]), because they only ac- 
cess data through inner products, hence through the kernel. This "kernel 
trick" allows, for example, to perform supervised classification or regression 
on strings or graphs with state-of-the-art statistical methods as soon as a 
positive definite kernel for strings or graphs is defined. Unsurprisingly, this 
has triggered a lot of activity focused on the design of specific positive defi- 
nite kernels for specific data, such as strings and graphs for applications in 
bioinformatics in natural language processing [1]. 

Motivated by applications in computational chemistry, [2] proposed re- 
cently a kernel for labeled graphs, and more generally for structured data 
that can be decomposed into subparts. The kernel, called optimal assign- 
ment kernel, measures the similarity between two data points by performing 
an optimal matching between the subparts of both points. It translates a 
natural notion of similarity between graphs, and can be efficiently computed 
with the Hungarian algorithm. However, we show below that it is in general 
not positive definite, which suggests that special care may be needed before 
using it with kernel methods. 

It should be pointed out that not being positive definite is not necessarily 
a big issue for the use of this kernel in practice. First, it may in fact be pos- 
itive definite when restricted to a particular set of data used in a practical 
experiment. Second, other non positive definite kernels, such as the sig- 
moid kernel, have been shown to be very useful and efficient in combination 
with kernel methods. Third, practitioners of kernel methods have developed 
a variety of strategies to limit the possible dysfunction of kernel methods 
when non positive definite kernels are used, such as projecting the Gram 
matrix of pairwise kernel values on the set of positive semidefinite matrices 
before processing it. The good results reported on several chemoinformatics 
benchmark in [2] indeed confirm the usefulness of the method. Hence our 
message in this note is certainly not to criticize the use of the optimal as- 
signment kernel in the context of kernel methods. Instead we wish to warn 
that in some cases, negative eigenvalues may appear in the Gram matrix 
and specific care may be needed, and simultaneously to contribute to the 
limitation of error propagation in the scientific litterature. 
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2 Main result 



Let us first define formally the optimal assignment kernel of ^ . We assume 
given a set X', endowed with a positive definite kernel ki that takes only 
nonnegative values. The objects we consider are tuples of elements in X' , 
i.e., an object x decomposes ) , where n is the length of the 

tuple X, denoted and xi, . . . , x^, E X' . We note X the set of all tuples of 
elements in X' . Let Sn be the symmetric group, i.e., the set of permutations 
of n elements. We now recall the kernel on X proposed in [2]: 

Definition 1. The optimal assignment kernel kA '■ X x X ^ is defined, 
for any x,y £ X, by: 



We can now state our main theorem. 

Theorem 1. The optimal alignment kernel is not always positive definite. 

Before proving this results we can make a few comments. 

Remark 1. The meaning of the statement "not always" in TheoremUl is 
that there exist choices of X' and ki such that the optimal assignment kernel 
is positive definite, while there also exist choices for which it is not positive 
definite. 

Remark 2. Theorem{l\ contradicts Theorem 2.3 in f^, which claims that 
the optimal assignment kernel is always positive definite. The proof of 
Theorem 2.3 in f^, however, contains the following error. Using the no- 
tations of l^, the author define in the course of their proof the values 

A := 2YT^^iVn+iVjKn+ij and B := ""n+i-^n+i.n+i ^^ey 

show that A < B, on the one hand, and that ^ < 0, on the other hand. 
From this they conclude that B < 0, which is obvioulsy not a valid logical 
conclusion. 

In order to prove Theorem [H we now provide an example of (X',ki) 
pair that leads to a positive definite optimal assignment kernel, and another 
example that leads to the opposite conclusion. 

Lemma 1. Let X' = {1} be a singleton, and ki{l, 1) = 1. Then the optimal 
assignment kernel is positive definite. 




max^g5l^l El=i^i(a^i,y7r(j)) if \y\ > kl 
max^g5l^l h {Xn{i) , Vi) otherwise. 
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Proof. When X' = {1}, the tuples are simply repeats of the unique element, 
hence each element x = (1,...,1) G X is uniquely defined by its length 
l^l G N. The optimal assignment kernel is then given by: 



kA{x,y) 



mm [\x\ 



\y\ 



The function min (a, b) is known to be positive definite on N, therefore k is 
a valid kernel on X . □ 

Lemma 2. Let X' = and ki{x,y) = exp (— 7I |x — y| p) , for x,y G 
and 7 > 0. Then the optimal assignment kernel is not positive definite. 

Proof. The function ki defined in Lemma [2] is the well-known Gaussian ra- 
dial basis function kernel, which is known to be positive definite and only 
takes nonnegative values, hence it satisfies all hypothesis needed in the def- 
inition of the optimal assignment kernel. In order to show that the latter is 
not positive definite, we exhibit a set of points in X that can not be embed- 
ded in a Hilbert space through ([1]). For this let us start with four points that 
form a square in X', e.g., A = (0,0), S = (1,0), C = (1,1) and D = (0,1) 
(Figure [1]). Denoting a := exp(— 7), we directly obtain from the definition 
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Figure 1: Four points in X' = 

ki{x,y) = exp (-7||x - 



^, endowed with the positive definite kernel 



of ki that: 

'ki{A,A) = ki{B,B) = ki{C,C) = ki{D,D) =1, 

< ki{A,B) =ki{B,C) =ki{C,D) = ki{D,A) =a, 

ki{A,C) =ki{B,D) =a^. 

In the space X of tuples, let us now consider the six 2-tuples obtained 
by taking all pairs of distinct points: AB, AC, AD, BC, BD,CD. Using 
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the definition of tlie optimal assignment kernel k{uv,wt) = max{ki{u,w) + 
ki{v, t), /ci(u, t) + ki{v, w)) for u, v,w,t £ {A, B, C, D}, we easily obtain: 

' k{AB, AB) = k{AC, AC) = k{AD, AD) = k{BC, BC) 

= k{BD, BD) = k{CD, CD) = 2 , 
k{AB, AC) = k{AB, BD) = k{BC, BD) = k{BC, AC) 

= k{CD, AC) = k{CD, BD) = k{AD, AC) = k{AD, BD) = 1 + a , 
k{AB, BC) = k{BC, CD) = {CD, AD) = k{AB, AD) = 1 + , 
k{AB,CD) = k{AD,BC) = k{AC,BD) = 2a. 

If k was positive definite, then these six 2-tuples could be embedded to a 
Hilbert space H by a mapping : X ^ TL satisfying ([1]). Let us show that 
this is impossible. Let d{x,y) = — $(?/)||-^ be the Hilbert distance 

between two points x,y £ X after their embedding in TC. It can be computed 
from the kernel values by the classical equality: 

d{x, y)^ = k{x, x) + k(y, y) - 2k{x, y) . 

We first observe that d{AB, ACf = d{AC, CD)^ = 2-2a, and d{AB, CD)^ = 
4 — 4a. Therefore, 

d{AB, CD)^ = d{AB, AC)^ + d{AC, CD)^ , 

from which we conclude that {AB, AC, CD) form a half-square, with hy- 
potenuse {AB,CD). A similar computation shows that {AB,BD,CD) is 
also a half-square with hypotenuse {AB,CD). Moreover, 

d{AC, BD) = 4 - 4a = d{AB, CD) , 

which shows that the four points {AB, AC, CD, BD) are in fact coplanar and 
form a square. The same computation when AB and CD are respectively 
replaced by AD and BC shows that the four points {AD, AC, BC, BD) 
are also coplanar and also form a square. Hence all six points can be 
embedded in 3 dimensions, and the points {AB, AD,CD, BC) are them- 
selves coplanar and must form a rectangle on the plane equidistant from AC 
and BD (Figure [2]) . The edges of this rectangle have all the same length 
d{AB,BC)^ = d{BC,CD)^ = d{CD,AD)^ = d{AD,AB)^ = 2 - 2a^ and 
is therefore a square, whose hypotenuse {AB, CD) should have a length 
V 4: — Aa?. However a direct computation gives d{AB,CD) = \/4 — 4a, 
which provides a contradiction since < a < 1. Hence the six points can 
not be embedded into a Hilbert space with k as inner product, which shows 
that k is not positive definite on X. □ 
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Figure 2: The neeessary configuration of the six 2-tuples if embedding with 
the optimal assignment kernel was possible. 
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